Systems and Methods for Automated Generation of Passage-Based Items for Use in Testing or Evaluation

ABSTRACT

Systems, apparatuses, and methods for automatically generating text items that may be used on an exam or test. In some embodiments, the text items may take the form of a question or statement. The exam or test may be used for evaluating a test-taker&#39;s knowledge, proficiency, reading comprehension, or other similar purpose or goal.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application No.63/243,857, filed Sep. 14, 2021, and titled “Systems and Methods forAutomated Generation of Passage-Based Items for Use in Testing orEvaluation”, the disclosure of which is incorporated in its entirety bythis reference.

BACKGROUND

The advent of internet-based computerized skills assessment offers manyadvantages compared to more traditional paper-based assessments. Theseinclude support for innovative item types and alternative item formats,measurement of more complex knowledge, skills, and competencies,automated scoring which also allows immediate feedback to students, andadaptive testing and testing on-demand. These advantages have resultedin increased summative as well as formative testing and, consequently,the challenge of needing higher levels of test item development tosupport these new features of testing.

One method that may be used to address this challenge is throughautomatic item generation. Automatic item generation (AIG) is typicallybased on the notion of item “models” (Bejar 2002), that is schemas ofproblems with parameters that can be instantiated with specific values.For example, the model X+Y=?, where X and Y can be any whole numbers inthe range 0-9, has two parameters. An item model can be instantiated,using a computer-based algorithm, to display an actual test orevaluation item (in this case, a single-digit addition exercise).

Automatic Item Generation (AIG) has been used to create items in diversecontent areas and formats. From a practical standpoint, the use of itemmodels expands the potential number of items that may be generated. Froma theoretical standpoint, item models provide an opportunity for aconstruct-driven approach to item development because the generateditems can be mapped to the construct through an analysis of thecognitive mechanisms related to the item solution and item features thatcall on these mechanisms.

However, an important limitation of the item model approach and itsconventional implementation is that it is limited in scope to contentareas that are sufficiently easy to model (such as mathematics). Inaddition, because the item generation process depends on highly skilledcontent experts to create the models, the AIG process is semi-automatedand can be costly to implement in terms of the required resources.

As a result, use of AIG as a technique to populate test or examinationcontent is limited to simple tasks. An example of a complex type of taskthat is not amenable to AIG is a reading comprehension task, which isthe most common method used for assessment of higher-level verbal skillsand abilities. For example, Sphinx, which is typically considered themost advanced system for the automated generation of readingcomprehension assessments, is a hybrid system that automaticallygenerates reading texts. However, the system relies on human experts togenerate reading comprehension questions, and item models to generatesimple grammatical questions, such as sentence fragment correctionquestions.

Embodiments of the disclosure overcome these and other disadvantages ofconventional approaches to generating content for evaluating theperformance of a test-taker, both collectively and individually.

SUMMARY

The terms “invention,” “the invention,” “this invention,” “the presentinvention,” “the present disclosure,” or “the disclosure” as used hereinare intended to refer broadly to all the subject matter disclosed inthis document, the drawings or figures, and to the claims. Statementscontaining these terms do not limit the subject matter disclosed or themeaning or scope of the claims. Embodiments covered by this disclosureare defined by the claims and not by this summary. This summary is ahigh-level overview of various aspects of the disclosure and introducessome of the concepts that are further described in the DetailedDescription section below. This summary is not intended to identify key,essential or required features of the claimed subject matter, nor is itintended to be used in isolation to determine the scope of the claimedsubject matter. The subject matter should be understood by reference toappropriate portions of the entire specification, to any or all figuresor drawings, and to each claim.

This disclosure is directed to systems, apparatuses, and methods forautomatically generating text items that may be used on an exam or test.In some embodiments, the text items may take the form of a question orstatement. The exam or test may be used for evaluating a test-taker'sknowledge, proficiency, understanding, comprehension, or other similarpurpose or goal.

In one embodiment, the disclosure is directed to a method (andassociated system, platform, or apparatus) for using a Transformer-basedLanguage Model (for example, Generative Pretrained Transformer 3 orGPT-3) to automatically generate a set of content-controlled passages,and test items for each passage that can be used to evaluate a testtaker's comprehension of the text. These test questions or items may beused to probe a test taker's comprehension of the text and their abilityto synthesize and distill the information beyond simple vocabulary orstandard questions (what, where, who, or why, as examples) that aregenerated using templates or conventional automated methods. Further,the test items are generated in a way that enables them to beautomatically scored, thereby reducing, or eliminating the need forhuman resources as part of the test generation and scoring processes.

In one embodiment, the disclosure is directed to a method forautomatically generating and selecting a set of test or examinationitems. In one embodiment, the general form of the data processing flowand associated logic implemented as part of the method may comprise:

-   -   Generate a set of text “Source Passages” using a        Transformer-based Language Model by providing:        -   A set of Instructions (i.e., goals or general            characteristics of the desired output) for the model;            -   Non-limiting examples of goals or characteristics                include                -   Format of a generated passage;                -   Length of a generated passage;                -   Style or comprehension level of a generated passage;        -   A set of “Examples” for use in “seeding” the            transformer-based model;            -   Each example is text and may be associated with one or                more of a desired format, subject matter, or style;            -   Each example may be associated with a set of attributes                and values (of the attribute) used to further describe                the characteristics which the example represents;        -   A relevant “Conditioning”, i.e., a set of attributes used to            control the final characteristics of the generated text;            -   Examples of a “condition” include but are not limited to                or required to include topic, sub-topic, names of                characters, argumentative position (e.g., supports,                opposes), length (e.g., number of paragraphs or                sentences), a set of keywords to incorporate into the                text, or a narrative perspective (e.g., 1st or 3rd                person);    -   Filter the generated Source Passages to identify/select those        that are best suited for generating test items;        -   This filtering may be based on a rule-set, one or more            criteria related to content, subject matter, or another            relevant attribute, as examples;    -   Generate one or more alternative passages for the selected        source passage using one or more of the attributes and        associated values of the selected source passage as the        conditioning for the transformer-based language model;        -   This leads to the generation of a set of topically and            stylistically similar related “Alternative Passages” with            distinct content for each selected Source Passage;        -   This may be done using the process for generating Source            Passages and the same Conditioning attributes as those used            to generate the selected Source Passage (or a subset of            those attributes);        -   In one embodiment of this process, the Source passage is            used as the example for the language model;        -   In some embodiments, generating alternative passages relies            on the non-deterministic nature of the language model to            generate passages different from the source passage, even            given the same inputs;    -   Based on the selected Source Passage(s), for each such passage,        generate a set of possible correct responses to one or more        multiple-choice questions using a Transformer-Based Language        Model (or Models) by providing:        -   A set of Instructions for the model;        -   A set of Examples to demonstrate the task;            -   Where each example consists of a text passage and a set                of one or more possible correct answers to a                multiple-choice question based on the passage;        -   The selected Source Passage for which correct answers will            be generated;            -   for generating the possible correct answers to the                question, the source passage itself is used as the                conditioning;    -   Using the set of Alternative Passages generated or corresponding        to each selected Source Passage, generate a set of possible        incorrect responses to each of the multiple-choice questions        referred to in the previous step using a Transformer-based        Language Model by providing:        -   The same Instructions and Examples used to generate the            possible correct response(s) for the selected Source            Passage;        -   The Alternative Passage for which answers will be generated            as the Conditioning;    -   Filter the generated test items (the possible correct and/or        incorrect responses) to remove categories of subject matter that        are not appropriate or do not satisfy one or more desired        metrics;        -   For possible correct responses, non-limiting examples of            such metrics include:            -   similarity to the source text as estimated by vector                similarities of the text encoded by a separate language                model;            -   similarity to individual sentences within the source                text via the above mechanism;            -   N-gram overlap with the source text;            -   probability of the generated answer under the                transformer-based language model;            -   probability of being correct as estimated by a                separately trained question answering model;            -   length; or            -   presence of overly rare words;        -   For possible incorrect responses, non-limiting examples of            such metrics include:            -   similarity to the source text;            -   similarity to the chosen correct answer;            -   similarity to unchosen potential correct answers;            -   similarity to other incorrect answers;            -   difference in probability of the generated text as                measured by the output distribution over tokens of the                transformer-based language model between the incorrect                answer and the correct answer;            -   length relative to the chosen correct answer; or            -   presence of overly rare words; and    -   Construct a set of test items using the selected correct        response (or responses) generated for each selected Source        Passage as the correct answer(s) and the selected incorrect        responses generated for the Corresponding Alternative Passage(s)        as the incorrect answer(s) for each of the multiple-choice        questions for which answers were generated.

In one embodiment, the disclosure is directed to a system forautomatically generating and selecting a set of test or examinationitems. The system may include a set of computer-executable instructionsstored in a memory or other data storage element (such as anon-transitory computer-readable medium) and one or more processors orco-processors. When executed by the processors or co-processors, theinstructions cause the processors or co-processors (or a device ordevices of which they are part) to perform a set of operations thatimplement an embodiment of the disclosed method or methods.

In one embodiment, the disclosure is directed to a non-transitorycomputer-readable medium containing a set of computer-executableinstructions, wherein when the set of instructions are executed by oneor more electronic processors or co-processors, the processors orco-processors (or a device or devices of which they are part) perform aset of operations that implement an embodiment of the disclosed methodor methods.

In some embodiments, the systems and methods disclosed herein mayprovide services through a SaaS or multi-tenant platform. The platformprovides access to multiple entities, each with a separate account andassociated data storage. Each account may correspond to a user, a set ofusers, an entity, a set or category of entities, a set or category ofusers, a set or category of data, an industry, or an organization, forexample. Each account may access one or more services, a set of whichare instantiated in their account, and which implement one or more ofthe methods or functions disclosed herein.

Other objects and advantages of the systems, apparatuses, and methodsdisclosed will be apparent to one of ordinary skill in the art uponreview of the detailed description and the included figures. Throughoutthe drawings, identical reference characters and descriptions indicatesimilar, but not necessarily identical elements. While the embodimentsdisclosed or described herein are susceptible to various modificationsand alternative forms, specific embodiments are shown by way of examplein the drawings and are described in detail herein. However, theexemplary or specific embodiments are not limited to the forms describedor intended to limit the set of possible embodiments. Rather, thepresent disclosure covers all modifications, equivalents, andalternatives falling within the scope of the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure are described with reference to thedrawings, in which:

FIG. 1 is a flowchart or flow diagram illustrating a method, process,operation, or set of functions that may be used in implementing anembodiment of the disclosure;

FIG. 2 is a diagram illustrating elements or components that may bepresent in a computer device or system configured to implement a method,process, function, or operation in accordance with an embodiment of thesystem and methods disclosed herein; and

FIGS. 3-5 are diagrams illustrating a deployment of the system andmethods described herein as a service or application provided through aSoftware-as-a-Service platform, in accordance with some embodiments.

Note that the same numbers are used throughout the disclosure andfigures to reference like components and features.

DETAILED DESCRIPTION

One or more embodiments of the disclosed subject matter are describedherein with specificity to meet statutory requirements, but thisdescription does not limit the scope of the claims. The claimed subjectmatter may be embodied in other ways, may include different elements orsteps, and may be used in conjunction with other existing or laterdeveloped technologies. The description should not be interpreted asimplying any required order or arrangement among or between varioussteps or elements except when the order of individual steps orarrangement of elements is explicitly noted as being required.

Embodiments of the disclosed subject matter will be described withreference to the accompanying drawings, which show by way ofillustration, example embodiments by which the disclosed systems,apparatuses, and methods may be practiced. However, the disclosure maybe embodied in different forms and should not be construed as limited tothe embodiments set forth herein; rather, these embodiments are providedso that this disclosure will satisfy the statutory requirements andconvey the scope of the disclosure to those skilled in the art.

Among other forms, the subject matter of the disclosure may be embodiedin whole or in part as a system, as one or more methods, or as one ormore devices. Embodiments may take the form of a hardware implementedembodiment, a software implemented embodiment, or an embodimentcombining software and hardware aspects. For example, in someembodiments, one or more of the operations, functions, processes, ormethods described herein may be implemented by a suitable processingelement or elements (such as a processor, microprocessor, co-processor,CPU, GPU, TPU, QPU, state machine, or controller, as non-limitingexamples) that are part of a client device, server, network element,remote platform (such as a SaaS platform), an “in the cloud” service, orother form of computing or data processing system, device, or platform.

The processing element or elements may be programmed with a set ofexecutable instructions (e.g., software instructions), where theinstructions may be stored on (or in) one or more suitablenon-transitory data storage elements. In some embodiments, the set ofinstructions may be conveyed to a user over a network (e.g., theInternet) through a transfer of instructions or an application thatexecutes a set of instructions.

In some embodiments, the systems and methods disclosed herein mayprovide services to end users through a SaaS or multi-tenant platform.The platform provides access to multiple entities, each with a separateaccount and associated data storage. Each account may correspond to auser, a set of users, an entity, a set or category of entities, a set orcategory of users, a set or category of data, an industry, or anorganization, for example. Each account may access one or more services(such as applications or functionality), a set of which are instantiatedin their account, and which implement one or more of the methods,process, operations, or functions disclosed herein.

In some embodiments, one or more of the operations, functions,processes, or methods disclosed herein may be implemented by aspecialized form of hardware, such as a programmable gate array,application specific integrated circuit (ASIC), or the like. Note thatan embodiment of the disclosed methods may be implemented in the form ofan application, a sub-routine that is part of a larger application, a“plug-in”, an extension to the functionality of a data processing systemor platform, or other suitable form. The following detailed descriptionis, therefore, not to be taken in a limiting sense.

Embodiments are directed to systems, apparatuses, and methods forautomatically generating text items that may be used on an exam or test.This may be followed by one or more filtering or selecting steps toprovide a set of test or exam elements, where each element may be partof an evaluation of a test taker's reading comprehension. In oneembodiment, the disclosed system and associated methods provide anend-to-end process to automatically generate reading comprehension textsand accompanying questions, and where a test taker's responses can beautomatically scored or evaluated.

In contrast to conventional approaches and instead of depending on inputfrom content experts, the described approach is based on using aTransformer-Based language Model, such as GPT-3, as part of generatingthe test elements. However, the described approach is not limited tousing GPT-3 and could be utilized with other Transformer-based LanguageModels. Such other models include GPT-2, Pathways Language Model (PaLM),BigScience Large Open-science Open-access Multilingual (BLOOM), or OpenPretrained Transformer (OPT-175B), as non-limiting examples.

In some embodiments, the language model is used to create related textsand other materials to provide reading passages, tasks, correct answers,distractors (i.e., incorrect responses for selected-response items), andother information that may be used to enable automated scoring of a testor exam. In one embodiment, certain materials (e.g., passages andaccompanying questions) may be placed on a practice test to collect testtaker responses. This form of test data may be used to fine-tune thetasks and/or scoring models implemented by an embodiment of thedisclosed system.

In one embodiment, the disclosed approach provides an automatedend-to-end process that generates high-level reading comprehensionquestions (e.g., questions about the main point of a passage or factualquestions about the passage) that can be automatically scored. Asignificant difference between the disclosed approach and conventionalstate-of-the-art AIG systems is that those systems are limited to textgeneration with limited question generation capabilities, whereas thedisclosed approach enables a full range of test-related tasks, from textgeneration, to question generation, to enabling automatic scoring oftest-taker responses.

For example, FineTune's solutions are presented as “dramaticallyspeeding up assessment item authoring” by generating possiblemultiple-choice options but does so without a distinction betweencorrect and incorrect responses. As another example, U.S. Pat. No.11,023,684, assigned to Educational Testing Service, listing Flor (et.al) as inventors and titled “Systems And Methods For AutomaticGeneration Of Questions From Text” is focused only on questiongeneration from a section of text, without considering text generationor automatic scoring functions. In addition, the system described in the'684 patent generates a relatively large number of possible or candidatequestions (approximately one per sentence), leaving the task of sortingthrough the questions to decide which are most useful to a subjectmatter expert (SME). Finally, the questions generated by the Flor systemare limited to yes-no questions and therefore may not represent testitems that are sufficiently effective at evaluating readingcomprehension or an understanding of the text.

Further details regarding one or more embodiments of the disclosedsystem or platform for automatically generating text items and filteringor selecting the generated items to provide a set of test or examelements are presented in the following sections.

In one embodiment, passages (sections of text for reading by atest-taker) are generated by providing a set of hand-crafted orautomatically extracted examples that satisfy a particular format,subject, narrative style, or other characteristic(s). Herein,“extracted” refers to the passages being contained in an existing textcorpus that matches a set of desired characteristics, as opposed tobeing deliberately written by a person to match the desiredcharacteristics. From an existing corpus, texts of varying lengths(sentences, paragraphs, or multi-paragraph segments, as examples) can beextracted using standard text processing methods.

The example passages are manually or automatically labeled with specificattributes (e.g., the name or type of format, or a keyword for thesubject matter, as non-limiting examples) as a preliminary step.Examples can then be grouped based on one or more common labels orattributes, with the remainder of the labels used as additionalconditioning information when test passages are being generated.

For example, one set of examples may be news articles, each with adifferent topic. Another set of examples may be a set of short narrativestories, each associated with a title and a set of characters. A thirdset of examples may be multiple text passages concerning a common topic,each associated with a particular source domain (such as news,textbooks, or blog posts, as examples).

In some embodiments, the remaining attributes (those not used for agrouping of the examples) may be used to provide more direct controlover the qualities of the generated text. For example, an attribute canbe used to control the topic or domain of the text, the register (as anexample, the level of formality, which may be determined by situationalcharacteristics (e.g., relationship among interlocutors/participants,channel mode and medium, language production circumstances,communicative purpose, or topic, as examples), which in turn affects thetype of language and choice of words used in that register), or theformat (i.e., dialogue, paragraph structure, or an enumerated list, asexamples). This provides a flexible approach for generating text with arange of qualities that may be useful for defining test items withdifferent characteristics or goals (e.g., comprehension of differentlevels of reading texts, organizing information, or identifyingsupporting and opposing arguments, as examples). The labels describingthe properties of the texts can be combined with the texts to serve aspart of the input to a transformer-based language model.

In some embodiments, the input(s) to a Transformer-Based Language Modelfor generating a type of test or evaluation content may comprise threeparts:

-   -   1. A set of plain-text instructions, describing the generative        task (referred to as “Instructions”);    -   2. A set of examples of the type of content to generate        (referred to as “Examples”); and    -   3. A partially completed example used to conditionally generate        the remaining, incomplete parts based on the example(s)        (referred to as “Conditioning”).

In one example, the “Instructions” are a human-readable description ofthe type of content it is desired to generate, in some cases along withother basic information. While not necessary, the Instructions have beenshown to improve performance of a model on certain tasks. An example setof Instructions for generating academic texts about a given topic (whichare provided as one of the attributes in the “Examples” and“Conditioning”) may be:

“Generate short paragraphs from high school textbooks on the specifiedtopic.”

Note that this Instruction provides the desired format (shortparagraphs), level (high-school textbook), and subject matter for theoutput (the specific topic).

A Transformer-Based Language Model uses the provided Examples todetermine the implicit attributes for the generated text. Asnon-limiting examples, these attributes may include word choice,grammatical structure, and the organization of the content. Asmentioned, Examples can have multiple attribute-value pairs, but eachExample should have all its attribute-value pairs specified, and the setof attribute-value pairs should be identical for all of the Examples(i.e., for a given task, all Examples should have all attributes incommon).

The names and values of the attributes are organized by the order inwhich the model should generate and/or use the values. For example, togenerate a news passage directed to a given topic, a set of news passageExamples might be ordered such that the “Topic” attribute comes first,followed by the “Text” of the Example passage. In this way, the “Topic”attribute can be understood by the model as being connected to thecontent of the Example's “Text”. In some embodiments, two or moreExamples may be joined into a single string of text, connected by adelimiter of a symbol or symbols to indicate a break between theExamples.

The use of examples as an input to a model is sometimes referred to byresearchers as a “few-shot” generation scenario, in which the model isprovided with only a small number of examples of the desired outputformat before being operated to generate the output. In otherembodiments, no examples may be provided to the model in what isreferred to as a “zero-shot” scenario. In a zero-shot scenario, only theInstructions would be provided to the model as an input and all relevantdetails used to condition the output of the model would be provided inthe Instructions.

The Conditioning is a partially (or fully) incomplete Example, which isused to exert more explicit control over the generated content. Theattribute-value pairs in the Conditioning should match the order of theattribute-value pairs in the Examples, and attribute-value pairs thatare defined in the Conditioning should come in sequential order. Inother words, there should be no “gaps” in the attributes defined in theConditioning when following the order defined in the Examples.Attributes without a corresponding value will be generated by theTransformer-Based Language Model in the same order as the attributes inthe Examples. Because the attribute-value pairs of the Conditioning areused to control the generation of subsequent attributes, this affords ahigher degree of control over the generated content.

For example, the Conditioning may be a set of attributes that are usedto exert more explicit control over the created content, such as thetopic and sub-topic(s), or the characters mentioned in a generatedpassage, as examples. In this case, the Conditioning attributes shouldbe a subset of the attributes present in the Examples and may be used tospecify a value or values for those attributes. The order of theattributes in the Conditioning should match the order of the attributesin the Examples.

If an attribute from the Examples is unspecified in the Conditioning,then the Transformer-Based Language Model will generate a value for thatattribute as part of its content generation process and then use thegenerated value as part of the Conditioning for the generated text. Inthis way, and depending on the needs of the user, the content of thegenerated passages can be controlled in whole or in part by the user orleft to the Transformer-Based Language Model, depending on the specificuse case, goal, or task.

In one embodiment, a value (or values) of the attributes in theConditioning may be determined by a set of heuristics, a rule-basedsystem, or by a machine learning model. For example, a trainedclassifier may be used to automatically label food reviews with whethera review is positive or negative (and thereby identify the sentimentassociated with the review). Those labels may then be used inconjunction with a passage as an Example for a Transformer Model tocause the model to generate a positive review of another, separateproduct. In this case, an advantage over conventional approaches is thatone could generate values for various types of attributes automaticallywithout requiring manual review.

A user could likewise use a machine learning model to verify that thegenerated text uses or possesses an attribute. For example, this may bedone by applying the same classifier model described to a passagegenerated using a “positive review” label and confirming that theclassifier also labels the generated passage as a “positive review”.

For the attributes or characteristics themselves (i.e., the fields orfeatures), one approach is to extend the method described above forgenerating attribute values. In addition to using a set of heuristics, arule-based system, or a machine learning model to generate labels forthe Examples, one could also use the same technique(s) to generatelabels for generated texts. If there is a significant mismatch (oroccurrence of mismatches) between the attribute value used to generatethe text and a model's predicted label for the text (where the predictedlabel is expected to indicate the attribute type or class), then thismay indicate that the attribute itself is uninformative to theTransformer-based Language Model or is not clearly enough expressed inthe Examples. One could use that information to cease using theattribute or to identify Examples which better exemplify the attribute.An advantage of the approach in this case is the ability to identify andvalidate useful/informative attributes.

As non-limiting examples of possible attributes and correspondingvalues:

-   -   sentiment: (positive, negative);    -   domain (academic, web text, news, conversations, etc.);    -   reading level (elementary, middle, high school or college);    -   topic (science, math, political science, law, etc.); or    -   Basic observable properties of the text such as # words, # of        paragraphs, presence or absence of specific named persons or        entities, or presence or absence of keywords.

Identifying informative attributes prior to large-scale generation oftext can help practitioners avoid generating large volumes of text basedon an attribute that is not used (or not used reliably) by a model.Having a larger number of useful attributes also helps when generatingAlternative Passages, as the more attributes that are defined, the morethe model will have access to when defining the Conditioning used togenerate the Alternative Passages so that they better match a SourcePassage.

For example, if the Source Passage were a positive product review for avacuum and generated based on the Conditioning that the review should beabout a “vacuum”, and the generated Alternative Passages were allnegative reviews of a vacuum, then answers generated from theAlternative Passages might be too easy to identify.

An Example of an input that might be passed to a Transformer-BasedLanguage Model to generate text in a narrative style is shown below.Examples may be delimited using three equal signs (

). In the following, the Instructions are denoted in bold, the Examplesin italics, and the

Conditioning is underlined.

Generate Short Stories Based on the Topic, Characters, and Title.

Topic: Cars Characters: John Title: John's New Car

Generated Passage: John was looking to buy a new car but had no ideawhat kind he wanted. At first, he thought he should buy a basic sedanlike his last car, but recently he had been thinking about splurging andbuying a fancier sports car. However, sports cars were expensive, andJohn was worried about how much money he would have to have for a downpayment. However, one night, John saw a commercial on T for a localdealership that promised no down payment was necessary. John headed overthe next day and ended up leaving with a fancy red sports car.

Topic: Sports Characters: Nora, Travis Title: How Nora Learned toSnowboard

Passage: [text of passage]

. . .Additional examples. . .

Topic: Science Characters:

Using the above prompt (the Conditioning), the Transformer-basedLanguage Model will generate a list of characters, a title, and apassage all centered around a topic of Science in a style similar tothat of the text of the Example passages. The model will output thisinformation in a single response text block, from which the attribute orother values can be extracted. Additionally, a validation process may beperformed on the response to ensure that it follows the expected format.An example of an expected response from the Transformer-Based LanguageModel is shown below:

Sam, Jane Title: Sam and Jane's Big Discovery

Passage: [text of passage]

Note that the first line of the response corresponds to the firstattribute in the Conditioning with an unspecified value (“Characters”,in this case).

To determine which of a set of generated passages are suitable (or mostsuitable) to use for generating test items, the passages may beevaluated and filtered based on one or more of a multitude of criteria,including but not limited to (or required to include) the following:

-   -   Minimum/Maximum number of sentences;    -   Minimum/Maximum number of words;    -   Minimum/Maximum number of characters;    -   Presence of duplicated words/phrases/sentences;    -   Presence of extremely rare words;    -   Presence of potentially offensive/inappropriate        words/phrases/sentences;    -   Presence of punctuation or grammatical errors;    -   A measure of the expected difficulty of the passage, as may be        estimated by an external machine learning model;        -   One embodiment of this type of model is a regression model            trained to predict the difficulty of a passage as expressed            on the Common European Framework of Reference (CEFR) scale            based on features such as word and sentence length,            word-level probabilities, and Fisher score features¹; ¹ See            Machine Learning-Driven Language Assessment, Burr Settles,            Geoffrey T. LaFlair Masato Hagiwara, Transactions of the            Association for Computational Linguistics (2020) 8: 247-263.    -   An estimate of the approximate likelihood of a phrase or        sentence in the passage occurring in other situations or from        other sources; and/or    -   An evaluation of how well the passage reflects the targeted        domains/registers.

Passages that do not meet one or more of these automatically appliedcriteria and/or other manually defined criteria are typically excludedfrom further steps in the processing flow or pipeline. After thisfiltering of the generated passages, they may undergo a two-step processthat includes copy-editing, followed by a fairness and bias review. Inthe copy-editing stage, passages may be edited or modified by automatedprocesses and/or human editors to account for mechanical or grammaticalerrors as well as inconsistencies in the logical flow of ideas orcohesive aspects (such as pronouns). Next, passages may additionally bevetted by human reviewers for potential fairness or bias problems. Thistype of issue can range from inappropriate or potentially offensivecontent to passages that would unfairly advantage or disadvantage aparticular group of test-takers due to systemic differences in knowledgeof (or familiarity with) the content.

In some embodiments, the general steps or stages of a process forgenerating test items are as follows. Given a passage (termed a SourcePassage):

1. Define one or more item generation templates, with each templateconsisting of a set of Instructions and Examples for generatingquestions and/or answers that can be used as input for aTransformer-Based Language Model;

-   -   a. This process and example characteristics of a template are        described further herein;

2. Generate a set of passages that are similar to the Source Passageusing the above-described passage generation process and defining theConditioning as a set of attributes and values that are identical to,similar to, or a subset of those of the Source Passage. These newpassages (referred to as Alternative Passages) will be similar in styleand content to the Source Passage as a result, making them suitable foruse in generating incorrect answers for a multiple-choice item (amultiple-choice test question);

-   -   Theoretically, an alternative passage could be used as a source        passage. For example, someone could compare the source and        alternative passages to identify the “best” passage (e.g., by        the filtering/labeling methods described), but in some cases,        that may not be necessary. The alternative passages could be        administered as items on different versions of the same test, as        an example, to reduce opportunities for cheating on a test;    -   With regards to defining the conditioning, if a Source passage        was generated using the following conditioning:        -   “Topic”: “Science”        -   “Title”: “Basics of Quantum Mechanics”,    -   then the alternative passages would be generated using that same        conditioning. Alternative passages could also be generated using        a subset of the attributes, such as:        -   “Topic”: “Science”    -   It is also possible to adjust the conditioning used to generate        the Alternative Passages in a way that would not substantially        change the overall goal of the intended content. How this would        be done depends on the attributes used; for example, Alternative        Passages could use something similar to the following:        -   “Topic”: “Physics”        -   “Title”: “Basics of Quantum Mechanics”, or        -   “Topic”: “Science”        -   “Title”: “Quantum Mechanics”;

3. Generate a set of possible answers to the item using the Instructionsand Examples defined in step 1 and the Source Passage as theConditioning for the Transformer-Based Language Model. These generatedanswers will be the pool of potential correct answers (termed CorrectCandidates);

4. For each of the Alternative Passages, generate a set of possibleanswers to the item using the Instructions and Examples defined in step1 and the Alternative Passage as the Conditioning input. These generatedanswers will be the pool of potential incorrect answers (referred to asDistractor Candidates);

5. Compute one or more metrics for each of the pool of correctcandidates. Based on the computed metrics, filter, rank, and select a“correct” answer for the item from the pool of correct candidates.Examples of criteria that may be used as part of the filtering andranking are listed below, along with a suggested reason for why theycould be useful to assist in selecting a correct answer:

-   -   a. Minimum/maximum number of words-->ensures candidates are all        approximately the same length;    -   b. The transformer-based model's estimated probability of        generating the candidate, conditioned on the Item template and        Source Passage-->this identifies potentially unlikely or        spuriously generated answers on the part of the model;    -   c. Similarity to the passage/individual sentences in the        passage-->this helps to filter out spurious answers (those with        a low similarity value), as well as identify answers that are        more likely to match the content of the passage (those with        higher similarity);    -   d. Average similarity to other Correct Candidates-->a model is        likely to generate variants of one or more correct answers.        Higher average similarity may indicate that the candidate is one        of a few variants of a single answer, making it more likely to        be correct; and    -   e. Use of other forms of automated assessments (e.g., by other        machine learning models)-->exact purposes vary, but this may        provide additional sources of evidence for why a candidate might        be a good or bad choice in the context;

6. Compute one or more metrics for each of the pool of distractorcandidates. Filter, rank and select an appropriate number of distractors(i.e., incorrect answers) from the pool of distractor candidates. Asnon-limiting examples, such metrics may include one or more of thosementioned with regards to evaluating the Source Passages or the correctcandidates.

Test item generation templates are similar to passage generationtemplates, with a manually defined set of Instructions and a manually orautomatically created set of Examples. For item generation templates,the components of each Example typically consist of a passage, anoptional question or requested task (if the text of the task/questiondepends on the passage), and an optional set of one or more correctanswers or responses. To use the template to generate the questionand/or correct answer(s), a passage (the Source Passage) is used as theConditioning information for the transformer model.

For example, a template for a task to generate a plausible next sentencein a story could be formatted as shown below. In the example below, theInstructions are denoted in bold, the Examples in italics, and theConditioning is underlined.

Generate a Possible Next Sentence to Continue the Story.

Passage: John was looking to buy a new car but had no idea what kind hewanted. At first, he thought he should buy a basic sedan like his lastcar, but recently he had been thinking about splurging and buying afancier sports car. However, sports cars were expensive, and John wasworried about how much money he would have to have for a down payment.However, one night, John saw a commercial on TV for a local dealershipthat promised no down payment was necessary. John headed over the nextday and ended up leaving with a fancy red sports car.

Possible Next Sentences:

1. John was very happy with his new car and couldn't wait to show it offto his coworkers.2. John was so excited about his new car that he accidentally ended upspeeding on the highway and got pulled over by a police officer.3. Now John likes to spend his weekends washing and waxing his new carto make sure it stays in perfect condition.

Passage: [example passage 2]

Possible Next Sentences:

1. [example 2.1 candidate]2. [example 2.2 candidate]3. [example 2.3 candidate]

. . . Further examples

Passage: [Source Passage] Possible Next Sentences:

1.

Note that in this example, the Conditioning contains the passage, unlikein the examples used to generate passages. Instead, the primarygenerated content is a possible next sentence in the story. Based on theformat of the item template, the Transformer-Based Language Model willrespond with multiple such possible sentences.

The item templates may be used to generate multiple choice test items,which will incorporate distractors (i.e., incorrect answers) toconstruct the test item that a test-taker will see. To do this, theprocess generates passages that are similar to the Source Passage forthe item, which are referred to as Alternative (alternate) Passages. Thealternate passages are generated using the previously described passagegeneration process by using:

-   -   1. The same Instructions as were used to generate the Source        Passage;    -   2. The same set of Examples as were used to generate the Source        Passage; and    -   3. All or a subset of the attribute-value pairs from the Source        Passage as the Conditioning (excluding the text).

By using the same instructions, examples, and passage attributes thatwere used to generate the Source Passage, the disclosed process canensure that the newly generated Alternative Passage(s) will share manyof the same characteristics of the Source or original passage. Forexample, if the Source Passage has three attributes for “Topic”,“Title”, and “Person”, an Alternative Passage can be generated using thesame values for those attributes to produce a topically andstylistically similar passage.

Alternative passages can also be generated using a subset of theattribute-value pairs of a Source Passage. A result of this may be anAlternative Passage which is slightly more differentiated from theSource Passage than one which was generated using all of the attributes.A subset of the Source Passage's attribute values could also be modifiedto produce further differences between the Source Passage and thegenerated Alternative Passages. The types of modifications to the set ofattributes may depend on the type of attributes used to condition thepassage. In one non-limiting example, a Source Passage could begenerated with the “Topic” attribute of “Physics” and sub-topic of“Gravity”, while its Alternative Passages could be generated using themore general topic “Science” and sub-topic of “Gravity”.

In addition, when generating answers to a multiple-choice question, thealternative passage(s) can be used to generate answers that willlikewise be topically and stylistically similar to ones generated usingthe Source Passage. However, they will differ in terms of exact content,making them appropriate for use as distractors, or incorrect butfeasible answers to a question.

The generated Alternative Passages are typically subject to a similarfiltering process as the Source or original set of passages, asdescribed in the prior discussion of the passage selection andeligibility criteria. In addition, one can compute the textualsimilarity between a generated Alternative Passage and the correspondingSource Passage. This may be helpful in filtering out those AlternativePassages that are “too similar” to the Source Passage (i.e., those thatmight inadvertently produce a correct answer to the Source passage), aswell as those that are too different (i.e., those that would produceanswers that are significantly and obviously different from the correctone for a Source Passage).

There are multiple methods of computing textual similarity, such as bygenerating word embeddings (vectors) and comparing two vectors using adefined metric (Euclidean distance or cosine distance, as non-limitingexamples). The threshold value used to determine a situation of toogreat or too little similarity can be tuned based on internallyevaluated metrics derived from generated answers and items, as well asby the performance of test takers on generated items and answers. Otherapproaches to determining textual similarity include evaluating semanticsimilarity, contextual evaluation, or another suitable approach.

As a further and non-limiting example, textual similarity may bedetermined by measuring n-gram overlap or measuring the percentage (%)overlap of unique words/word lemmas between two samples. For generatingthe “word” vectors, suitable approaches include counting occurrences ofeach word and weighting them (using Term Frequency or TermFrequency-Inverse Document Frequency, known as TF-IDF weight, as anexample). One could also or instead use word embeddings produced by amachine learning model, aggregating over the words in a selection oftext (by averaging them, or by averaging and normalizing based onstatistics from a large collection of texts, as examples).

One could (or instead) also input the text to a neural network model.Two common types of models are transfer-focused sentence encoders andautoencoders. Transfer-focused sentence encoders try to represent a longpiece of text in a vector that can be used as an input to other tasks,such as identifying properties of a sentence, as an example.Autoencoders are trained to transform a piece of text into an embeddingfrom which the original sentence can be reproduced.

In one embodiment, the process may generate test items and/or (correct)responses based on a passage using item generation templates. In thisembodiment, an item generation template may include (1) theinstructions, (2) examples, with each consisting of a passage and one ormore correct responses to the task, and (3) the conditioning, consistingof the text of the passage.

For a given passage (the Source Passage), a correct answer to amultiple-choice test item can be generated using an Item Template inconjunction with the Source Passage as the Conditioning, as shown in theprior example (Generate a possible next sentence to continue the story).As also described, a single Item Template can potentially generatemultiple correct answers, and a Transformer-based Language Model can beused to generate answers non-deterministically. Therefore, multipleapplications of the model may provide a range of different answers;however, the process may also generate answers that are not correct.

To address this concern regarding possible incorrect answers, the SourcePassage may be used multiple times to generate a pool of potentialcorrect answers (a set of Correct Candidates). These candidates may thenbe evaluated using one or more criteria to identify the best possiblecandidate(s), as described herein.

As mentioned, the Alternative Passages can be used to generatedistractors. Each Alternative Passage may be used to generate possibleincorrect answers by treating the Alternative Passage as if it were theSource Passage, i.e., pairing it with the Item Template and using aTransformer-based Language Model to generate correct answers for thatpassage. These answers, when paired with the actual Source Passage, areexpected to be incorrect due to the variations in the content of theAlternative Passages compared to the Source Passage. These answers maytherefore be used as a set of Distractor Candidates.

As with the Correct Candidates, the transformer model's stochasticgeneration process could generate answers that are unintentionallycorrect (or close to being correct) when paired with the Source Passage.As a result, the Distractor Candidates may also be evaluated, similarlyto the Correct Candidates, to identify the best possible distractoranswers for an item.

To identify “good” (that is suitable or appropriate) candidates out ofone or more candidate pools (such as the Correct Candidate or DistractorCandidate pools), embodiments of the disclosed process may define one ormore metrics for a given candidate that can be derived from acombination of the Source Passage, the text of the candidate inquestion, the text of other candidates in the pool, and, when computingmetrics for a Distractor Candidate(s), the text and metrics ofcandidates in the Correct Candidates pool. In general, one or both ofcorrect answers and distractor candidates may be filtered for overlyrare words, grammatical errors and/or offensive content, similarly tothe filtering that may be applied to passages.

For example, Correct Candidates can be filtered and evaluated based oncriteria including but not limited to (or required to include):

1. Minimum or maximum number of words;

2. The Transformer-based Language Model's estimated probability ofgenerating that candidate, conditioned on the Item Template and SourcePassage (where a higher probability is indicative of a “better”candidate);

3. Similarity to the passage or to individual sentences in the passage;

4. Average similarity to other Correct candidates; and/or

5. Other forms of automated assessments (e.g., as generated by a trainedmachine learning (ML) model);

-   -   a. an example of such an automated assessment would be a        paraphrasing task, where the question asks the test taker to        identify which answer best paraphrases a sentence or segment        from the text. In this case, a trained ML model that can        identify whether two sentences are paraphrases could be used to        potentially provide confirmatory evidence that a given “correct        answer” is indeed a paraphrase;    -   b. another example may use a trained model that tries to        evaluate the quality of a summary of a piece of text, and which        could be used in the ranking equation. This could be used to        help rank potential correct candidates based on their estimated        “quality”.

Correct Candidates can be ranked based on the values of one or more ofthe discussed metrics. The ranking method or formula used may vary fromitem to item but will typically consider the probability of thecandidate being generated by the Transformer-based Language Model andits similarity to the passage. The top-ranked candidate may then beselected as the correct answer for an item. Alternatively, for items inwhich multiple correct answers should be selected from a set of options,the top N candidates may be selected.

Distractor Candidates can similarly be filtered and evaluated based onone or more criteria or metrics, including but not limited to (orrequired to include):

1. Minimum or maximum number of words;

2. The Transformer-based Language Model's estimated probability ofgenerating that candidate, conditioned on the Item Template and SourcePassage (where a lower probability indicates that the candidate is lesslikely to be correct);

3. Similarity to the passage or to individual sentences in the passage;

4. Average similarity to the other correct candidates;

5. Difference in probability or passage similarity between thedistractor and the selected correct candidate(s);

6. Average similarity to other selected distractors; and/or

7. Other forms of automated assessments (e.g., as generated by a trainedmachine learning (ML) model).

Using these criteria (or similar ones) and derived metrics, Distractorscan be filtered and ranked, and the top N selected as incorrect answersfor a multiple-choice test item. In some embodiments, the metrics,thresholds, and ranking criteria can be tuned and improved through ananalysis of user response data collected on a sample of test itemsgenerated using the disclosed process flow (where such a sample mayinclude 25-50 items, with approximately 200 responses to each).

Aside from the reading comprehension example described, the disclosedapproach may be used to generate items for an evaluation of a user's“text comprehension”, i.e., items that evaluate a test taker'sunderstanding of a passage or passages (one that may be shown, or thatthe test taker may have seen before). To that end, the disclosed processmay be used to generate factual recall and synthesis-based questions inother domains. For example, a modification of the described process flowor pipeline could have the Source Passages come from an actual textbookthat a student has read. The questions could be derived from the textand the rest of the pipeline would be as described, where on the testitself, the student would be expected to recall the facts from memory.

In one embodiment of the disclosed process flow, a test item could begenerated that uses two or more Source Passages, where the goal is toask questions that require a test-taker to compare and contrast thepassages. In this example, correct answers could be generated using bothpassages as the Conditioning, or using Alternative Passages based on oneor both the Source Passages. In another embodiment, incorrect answerscould be generated using one of the two passages. For example, an itemcould ask a test taker to identify which sentence best supports theargumentative point of passage A. Correct answers could be generatedusing passage A or Alternative Passages of passage A as theConditioning, while incorrect answers could be generated using passage Bor Alternative Passages of passage B as the Conditioning.

In another embodiment of the disclosed process flow, a test item couldbe generated that uses the Source Passage to generate a correct answerand/or comprehension question and does not use distractors. This wouldsimulate a free response reading comprehension question, in which thetest taker uses the passage to answer the question in their own words.

In one embodiment, a comprehension question could be generated where theexamples consist of a passage, a segment of text from the passage, and aquestion for which the selected segment of text is the correct answer.Questions could then be generated for a new passage using a Conditioningconsisting of the new passage and a segment of text from that passage.These items can be administered in a way that makes them (relatively)easy to grade by asking the test taker to identify (e.g., highlight) thepart of the text that answers the question and comparing it with thesegment used to generate the question.

FIG. 1 is a flowchart or flow diagram illustrating a method, process,operation, or set of functions 100 that may be used in implementing anembodiment of the disclosure. As shown in the figure, at step or stage102 a set of Instructions, Examples, and Conditioning is obtained orconstructed for use as inputs to a Transformer-Based Language Model(GPT-3, as a non-limiting example). The Transformer-Based Language Modelis executed using the inputs to generate a set of one or more sourcepassages for use in a reading comprehension test with multiple choicesas the form of answers, as suggested by step or stage 104 (in thisexample of a use case).

The set of generated source passages are then evaluated/filtered (assuggested by step or stage 106) to identify those “best suited” forgenerating test items in the following processing steps. As described,this evaluation or filtering may consider one or more criterion,including but not limited to (or required to include) Minimum or Maximumnumber of sentences, Minimum or Maximum number of words, Minimum orMaximum number of characters, Duplicated words, phrases, or sentences,the presence of extremely rare words, the presence of potentiallyoffensive or inappropriate words, phrases, or sentences, punctuation orgrammatical errors, difficulty, or estimates of the approximate averagelikelihood of any phrase or sentence in the passage, as non-limitingexamples.

Using each of the best suited (or otherwise selected, and in some casesdepending on the task a test-taker is being asked to perform) passagesas a Source Passage, the process then generates a set of Alternative(alternate) Passages (as suggested by step or stage 108). These aregenerated using a process similar to that described with reference tostep or stage 104, and with the same or similar Conditioning attributesas those used to generate the Source Passage.

Next, for each best suited Source Passage, the process generates a setof possible correct responses or answers to a multiple-choice question(as suggested by step or stage 110) using a Transformer-Based LanguageModel by providing:

-   -   a. A set of Instructions for the model;    -   b. A set of examples to demonstrate the task;        -   i. Each example consists of a text passage and a set of one            or more intended correct answers to a type of            multiple-choice question using the passage; and    -   c. A Source Passage for which correct answers will be generated        as the Conditioning;

Using the set of Alternative Passages generated for each best suitedSource Passage, the process then generates a set of possible incorrectresponses or answers to a multiple-choice question (as suggested by stepor stage 112) using a Transformer-based Language Model by providing:

-   -   a. The same Instructions and Examples used to generate the        possible correct responses or answer(s) for the Source Passage;        and    -   b. The Alternative Passage for which responses or answers will        be generated is used as the Conditioning.

The process then evaluates/filters the generated test items (thepossible correct and/or incorrect responses or answers) to removecategories of subject matter that are not appropriate and/or do notsatisfy one or more desired metrics (such as those mentioned herein), assuggested by step or stage 114.

The process then constructs a set of test items for each best suitedsource passage using the selected response generated for the SourcePassage as the correct answer and the selected responses generated forthe Alternative Passages as the incorrect answer(s), as suggested bystep or stage 116.

FIG. 2 is a diagram illustrating elements or components that may bepresent in a computer device, server, or system 200 configured toimplement a method, process, function, or operation in accordance withsome embodiments. As noted, in some embodiments, the described systemand methods may be implemented in the form of an apparatus that includesa processing element or elements and a set of executable instructionsstored in a memory. The executable instructions may be part of asoftware application and arranged into a software architecture.

In general, an embodiment of the disclosure may be implemented using aset of software instructions that are designed to be executed by asuitably programmed processing element (such as a GPU, TPU, CPU, QPU,state machine, microprocessor, processor, co-processor, or controller asnon-limiting examples). In a complex application or system suchinstructions are typically arranged into “modules” with each such moduletypically performing a specific task, process, function, or operation.The entire set of modules may be controlled or coordinated in theiroperation by an operating system (OS) or other form of organizationalplatform.

The application modules and/or sub-modules may include any suitablecomputer-executable code or set of instructions (e.g., as would beexecuted by a suitably programmed processor, microprocessor, or CPU),such as computer-executable code corresponding to a programminglanguage. For example, programming language source code may be compiledinto computer-executable code. Alternatively, or in addition, theprogramming language may be an interpreted programming language such asa scripting language.

As shown in FIG. 2 , system 200 may represent a server or other form ofcomputing or data processing device. Modules 202 each contain a set ofexecutable instructions, where when the set of instructions is executedby a suitable electronic processor (such as that indicated in the figureby “Physical Processor(s) 230”), system (or server or device) 200operates to perform a specific process, operation, function, or method.

Modules 202 may contain one or more sets of instructions for performinga method, function, or operation disclosed in the specification and/orwith reference to the Figures. These modules may include thoseillustrated but may also include a greater number or fewer number thanthose illustrated. Further, the modules and the set ofcomputer-executable instructions that are contained in the modules maybe executed (in whole or in part) by the same processor or by more thana single processor.

Modules 202 are stored in a memory 220, which typically includes anOperating System module 204 that contains instructions used (among otherfunctions) to access and control the execution of the instructionscontained in other modules. The modules 202 in memory 220 are accessedfor purposes of transferring data and executing instructions by use of a“bus” or communications line 219, which also serves to permitprocessor(s) 230 to communicate with the modules for purposes ofaccessing and executing a set of instructions. Bus or communicationsline 219 also permits processor(s) 230 to interact with other elementsof system 200, such as input or output devices 222, communicationselements 224 for exchanging data and information with devices externalto system 200, and additional memory devices 226.

Each application module or sub-module may correspond to a specificfunction, method, process, or operation that is implemented by themodule or sub-module. Each module or sub-module may contain a set ofcomputer-executable instructions that when executed by a programmedprocessor or processors cause the processor or processors (or a deviceor devices in which they are contained) to perform the specificfunction, method, process, or operation. Such function, method, process,or operation may include those used to implement one or more aspects ofthe disclosed system and methods, such as for:

-   -   Obtaining Instructions, Examples, and Conditioning as Inputs for        a Transformer-Based Language Model (as suggested by module 206);    -   Operating or Executing the Transformer-Based Language Model to        Generate Possible Source Passages (module 208);    -   Evaluating or Filtering the Generated Possible Source Passages        to Identify Those Best Suited for Generating Test Items (module        210);    -   Generating a Set of Passages (Alternative or Alternate Passages)        Similar to Each Best Suited Source Passage Using Some or All of        the Attributes and Values of the Source Passage as the        Conditioning (module 212);    -   Using Each Best Suited Passage as a Source Passage and        Conditioning to Generate Possible Correct Response(s) or Answers        Based on Item Generation Templates (module 214);        -   Although in some embodiments this step or stage may be            optional, it is typically used for purposes of generating            test items of a particular format;    -   For Each of the Set of Alternative Passages Similar to the        Source Passage, Generating Possible Incorrect Responses or        Answers Using the Alternative Passage as the Conditioning        (module 216);    -   Evaluating or Filtering the Possible Correct and Incorrect        Responses to Select One or More Correct Answers for Each Item        and One or More Incorrect Answers (module 217); and    -   Constructing a set of test items using the correct answer(s)        generated for the Source Passages and the incorrect answer(s)        generated for the Alternative Passages (module 218).

As mentioned, each module may contain instructions which when executedin whole or in part by a programmed processor, processors, orco-processors cause an apparatus (such as a server or client device) toperform a specific function or functions. The apparatus may be one orboth of a client device or a remote server or platform. Therefore, amodule may contain instructions that are executed (in whole or in part)by the client device, the server or platform, or both.

FIGS. 3-5 are diagrams illustrating a deployment of the system andmethods described herein for automatically generating and selecting aset of test or examination items as a service or application providedthrough a multi-tenant or Software-as-a-Service platform, in accordancewith some embodiments.

In some embodiments, the functionality and services provided by thesystem and methods described herein may be made available to multipleusers by accessing an account maintained by a server or serviceplatform. Such a server or service platform may be termed a form ofSoftware-as-a-Service (SaaS). FIG. 3 is a diagram illustrating a SaaSsystem in which an embodiment of the disclosure may be implemented. FIG.4 is a diagram illustrating elements or components of an exampleoperating environment in which an embodiment of the disclosure may beimplemented. FIG. 5 is a diagram illustrating additional details of theelements or components of the multi-tenant distributed computing serviceplatform of FIG. 4 , in which an embodiment of the disclosure may beimplemented.

In some embodiments, the system or service(s) disclosed or describedherein may be implemented as micro-services, processes, workflows, orfunctions performed in response to a user request. The micro-services,processes, workflows, or functions may be performed by a server, dataprocessing element, platform, or system. In some embodiments, theservices may be provided by a service platform located “in the cloud”.In such embodiments, the platform is accessible through APIs and SDKs.The services may be provided as micro-services within the platform foreach of multiple users or companies. The interfaces to themicro-services may be defined by REST and GraphQL endpoints. Anadministrative console may allow users or an administrator to securelyaccess the underlying request and response data, manage accounts andaccess, and in some cases, modify the processing workflow orconfiguration.

Note that although FIGS. 3-5 illustrate a multi-tenant or SaaSarchitecture that may be used for the delivery of business-related orother applications and services to multiple accounts/users, such anarchitecture may also be used to deliver other types of data processingservices and provide access to other applications. For example, such anarchitecture may be used to provide the data processing and test itemgeneration methodology described herein.

Although in some embodiments, a platform or system of the typeillustrated in FIGS. 3-5 may be operated by a 3^(rd) party provider toprovide a specific set of business-related applications, in otherembodiments, the platform may be operated by a provider and a differentbusiness may provide the applications or services for users through theplatform. For example, some of the functions and services described withreference to FIGS. 3-5 may be provided by a 3^(rd) party with theprovider of the functionality maintaining an account on the platform foreach company or business using a trained model to provide services tothat company's customers.

FIG. 3 is a diagram illustrating a system 300 in which an embodiment ofthe invention may be implemented or through which an embodiment of theservices described herein may be accessed. In accordance with theadvantages of an application service provider (ASP) hosted businessservice system (such as a multi-tenant data processing platform), usersof the services described herein may comprise individuals, businesses,stores, organizations, etc. A user may access the services using anysuitable client, including but not limited to desktop computers, laptopcomputers, tablet computers, scanners, smartphones, etc. In general, anyclient device having access to the Internet may be used to provide arequest or text message requesting customer support services and toreceive and display an intent tree model. Users interface with theservice platform across the Internet 308 or another suitablecommunications network or combination of networks. Examples of suitableclient devices include desktop computers 303, smartphones 304, tabletcomputers 305, or laptop computers 306.

System 310, which may be hosted by a third party, may include a set ofservices 312 and a web interface server 314, coupled as shown in FIG. 3. It is to be appreciated that either or both services 312 and the webinterface server 314 may be implemented on one or more differenthardware systems and components, even though represented as singularunits in FIG. 3 . Services 312 may include one or more functions oroperations for the generation of passages, test items, correctresponses, and incorrect responses for use in evaluating a test-taker'sreading comprehension, or other capability, as described herein.

In some embodiments, the set of services or applications available to acompany or user may include one or more that perform the functions andmethods described herein with reference to the enclosed figures. Asexamples, in some embodiments, the set of applications, functions,operations or services made available through the platform or system 310may include:

-   -   account management services 316, such as        -   a process or service to authenticate a person wishing to            access the services/applications available through the            platform (such as credentials or proof of purchase,            verification that the customer has been authorized by a            company to use the services, etc.);        -   a process or service to generate a container or            instantiation of the services, methodology, applications,            functions, and operations described, where the instantiation            may be customized for a particular user or company; and        -   other forms of account management services;    -   a set 318 of data processing services, applications, or        functionality, such as a process or service for one or more of:        -   Obtaining Instructions, Examples, and Conditioning as Inputs            for a Transformer-Based Language Model;        -   Operating or Executing the Transformer-Based Language Model            to Generate Possible Source Passages;        -   Evaluating/Filtering the Generated Possible Source Passages            to Identify Those Best Suited for Generating Test Items;        -   Generating a Set of Passages (Alternative or Alternate            Passages) Similar to Each Best Suited Source Passage Using            Some or All of the Attributes and Values of the Source            Passage as the Conditioning;        -   Using Each Best Suited Passage as a Source Passage and            Conditioning to Generate Possible Correct Response(s) Based            on Item Generation Templates;        -   For Each of the Set of Alternative Passages Similar to the            Source Passage, Generating Possible Incorrect Responses            Using the Alternative Passage as the Conditioning;        -   Evaluating the Set of Correct and Possible Responses to            Select One or More Correct Answers for Each Item and One or            More Incorrect Answers; and        -   Constructing a set of test items using the answer(s)            generated for the Source Passage as the correct answer(s)            and the answer(s) generated for the Alternative Passages as            the incorrect answer(s); and    -   administrative services 320, such as        -   a process or services to enable the provider of the data            processing or test item generation services and/or the            platform to administer and configure the processes and            services provided to users.

The platform or system shown in FIG. 3 may be hosted on a distributedcomputing system made up of at least one, but typically multiple,“servers.” A server is a physical computer dedicated to providing datastorage and an execution environment for one or more softwareapplications or services intended to serve the needs of the users ofother computers that are in data communication with the server, forinstance via a public network such as the Internet. The server, and theservices it provides, may be referred to as the “host” and the remotecomputers, and the software applications running on the remote computersbeing served may be referred to as “clients.” Depending on the computingservice(s) that a server offers it could be referred to as a databaseserver, data storage server, file server, mail server, print server, webserver, etc. A web server is a most often a combination of hardware andthe software that helps deliver content, commonly by hosting a website,to client web browsers that access the web server via the Internet.

FIG. 4 is a diagram illustrating elements or components of an exampleoperating environment 400 in which an embodiment of the invention may beimplemented. As shown, a variety of clients 402 incorporating and/orincorporated into a variety of computing devices may communicate with amulti-tenant service platform 408 through one or more networks 414. Forexample, a client may incorporate and/or be incorporated into a clientapplication (e.g., software) implemented at least in part by one or moreof the computing devices.

Examples of suitable computing devices include personal computers,server computers 404, desktop computers 406, laptop computers 407,notebook computers, tablet computers or personal digital assistants(PDAs) 410, smart phones 412, cell phones, and consumer electronicdevices incorporating one or more computing device components, such asone or more electronic processors, microprocessors, central processingunits (CPU), or controllers. Examples of suitable networks 414 includenetworks utilizing wired and/or wireless communication technologies andnetworks operating in accordance with any suitable networking and/orcommunication protocol (e.g., the Internet).

The distributed computing service/platform (which may also be referredto as a multi-tenant data processing platform) 408 may include multipleprocessing tiers, including a user interface tier 416, an applicationserver tier 420, and a data storage tier 424. The user interface tier416 may maintain multiple user interfaces 417, including graphical userinterfaces and/or web-based interfaces. The user interfaces may includea default user interface for the service to provide access toapplications and data for a user or “tenant” of the service (depicted as“Service UI” in the figure), as well as one or more user interfaces thathave been specialized/customized in accordance with user specificrequirements (e.g., represented by “Tenant A UI”, . . . , “Tenant Z UI”in the figure, and which may be accessed via one or more APIs).

The default user interface may include user interface componentsenabling a tenant to administer the tenant's access to and use of thefunctions and capabilities provided by the service platform. This mayinclude accessing tenant data, launching an instantiation of a specificapplication, causing the execution of specific data processingoperations, etc.

Each application server or processing tier 422 shown in the figure maybe implemented with a set of computers and/or components includingcomputer servers and processors, and may perform various functions,methods, processes, or operations as determined by the execution of asoftware application or set of instructions. The data storage tier 424may include one or more data stores, which may include a Service Datastore 425 and one or more Tenant Data stores 426. Data stores may beimplemented with any suitable data storage technology, includingstructured query language (SQL) based relational database managementsystems (RDBMS).

Service Platform 408 may be multi-tenant and may be operated by anentity to provide multiple tenants with a set of business-related orother data processing applications, data storage, and functionality. Forexample, the applications and functionality may include providingweb-based access to the functionality used by a business to provideservices to end-users, thereby allowing a user with a browser and anInternet or intranet connection to view, enter, process, or modifycertain types of information. Such functions or applications aretypically implemented by one or more modules of softwarecode/instructions that are maintained on and executed by one or moreservers 422 that are part of the platform's Application Server Tier 420.As noted with regards to FIG. 3 , the platform system shown in FIG. 4may be hosted on a distributed computing system made up of at least one,but typically multiple, “servers.”

As indicated, rather than build and maintain such a platform or systemthemselves, a business may utilize systems provided by a third party. Athird party may implement a business system/platform as described abovein the context of a multi-tenant platform, where individualinstantiations of a business' data processing workflow (such as the dataprocessing and test item generation disclosed herein) are provided tousers, with each company/business representing a tenant of the platform.One advantage to such multi-tenant platforms is the ability for eachtenant to customize their instantiation of the data processing workflowto that tenant's specific business needs or operational methods. Eachtenant may be a business or entity that uses the multi-tenant platformto provide business services and functionality to multiple users.

FIG. 5 is a diagram illustrating additional details of the elements orcomponents of the multi-tenant distributed computing service platform ofFIG. 4 , in which an embodiment of the disclosure may be implemented.The software architecture shown in FIG. 5 represents an example of anarchitecture which may be used to implement an embodiment of thedisclosure.

In general, an embodiment may be implemented using a set of softwareinstructions that are designed to be executed by a suitably programmedprocessing element (such as a CPU, microprocessor, processor,controller, or other form of computing device, as examples). In acomplex system such instructions are typically arranged into “modules”with each such module performing a specific task, process, function, oroperation. The entire set of modules may be controlled or coordinated intheir operation by an operating system (OS) or other form oforganizational platform.

As noted, FIG. 5 is a diagram illustrating additional details of theelements or components 500 of a multi-tenant distributed computingservice platform, in which an embodiment of the invention may beimplemented. The example architecture includes a user interface layer ortier 502 having one or more user interfaces 503. Examples of such userinterfaces include graphical user interfaces and application programminginterfaces (APIs). Each user interface may include one or more interfaceelements 504. For example, users may interact with interface elements toaccess functionality and/or data provided by application and/or datastorage layers of the example architecture. Examples of graphical userinterface elements include buttons, menus, checkboxes, drop-down lists,scrollbars, sliders, spinners, text boxes, icons, labels, progress bars,status bars, toolbars, windows, hyperlinks, and dialog boxes.Application programming interfaces may be local or remote and mayinclude interface elements such as parameterized procedure calls,programmatic objects, and messaging protocols.

The application layer 510 may include one or more application modules511, each having one or more sub-modules 512. Each application module511 or sub-module 512 may correspond to a function, method, process, oroperation that is implemented by the module or sub-module (e.g., afunction or process related to providing the disclosed data processingand related services to a user of the platform). Such function, method,process, or operation may include those used to implement one or moreaspects of the disclosed system and methods, such as for one or more ofthe processes, services, or functions described with reference to theFigures:

-   -   Obtaining Instructions, Examples, and Conditioning as Inputs for        a Transformer-Based Language Model;    -   Operating or Executing the Transformer-Based Language Model to        Generate Possible Source Passages;    -   Evaluating/Filtering the Generated Possible Source Passages to        Identify Those Best Suited for Generating Test Items;    -   Generating a Set of Passages (Alternative or Alternate Passages)        Similar to Each Best Suited Source Passage Using Some or All of        the Attributes and Values of the Source Passage as the        Conditioning;    -   Using Each Best Suited Passage as a Source Passage and        Conditioning to Generate Possible Correct Response(s) Based on        Item Generation Templates;    -   For Each of the Set of Alternative Passages Similar to the        Source Passage, Generating Possible Incorrect Responses Using        the Alternative Passage as the Conditioning;    -   Evaluating the Set of Correct and Possible Responses to Select        One or More Correct Answers for Each Item and One or More        Incorrect Answers; and    -   Constructing a set of test items using the answer(s) generated        for the Source Passage as the correct answer(s) and the        answer(s) generated for the Alternative Passages as the        incorrect answer(s).

The application modules and/or sub-modules may include any suitablecomputer-executable code or set of instructions (e.g., as would beexecuted by a suitably programmed processor, microprocessor, or CPU),such as computer-executable code corresponding to a programminglanguage. For example, programming language source code may be compiledinto computer-executable code. Alternatively, or in addition, theprogramming language may be an interpreted programming language such asa scripting language. Each application server (e.g., as represented byelement 422 of FIG. 4 ) may include each application module.Alternatively, different application servers may include different setsof application modules. Such sets may be disjoint or overlapping.

The data storage layer 520 may include one or more data objects 522 eachhaving one or more data object components 521, such as attributes and/orbehaviors. For example, the data objects may correspond to tables of arelational database, and the data object components may correspond tocolumns or fields of such tables. Alternatively, or in addition, thedata objects may correspond to data records having fields and associatedservices. Alternatively, or in addition, the data objects may correspondto persistent instances of programmatic data objects, such as structuresand classes. Each data store in the data storage layer may include eachdata object. Alternatively, different data stores may include differentsets of data objects. Such sets may be disjoint or overlapping.

Note that the example computing environments depicted in FIGS. 3-5 arenot intended to be limiting examples. Further environments in which anembodiment of the invention may be implemented in whole or in partinclude devices (including mobile devices), software applications,systems, apparatuses, networks, SaaS platforms, IaaS(infrastructure-as-a-service) platforms, or other configurablecomponents that may be used by multiple users for data entry, dataprocessing, application execution, or data review.

In some embodiments, certain of the methods, models, processes, orfunctions disclosed herein may be embodied in the form of a trainedneural network or other form of model derived from a machine learningalgorithm. The neural network or model may be implemented by theexecution of a set of computer-executable instructions and/orrepresented as a data structure. The instructions may be stored in (oron) a non-transitory computer-readable medium and executed by aprogrammed processor or processing element. The set of instructions maybe conveyed to a user through a transfer of instructions or anapplication that executes a set of instructions over a network (e.g.,the Internet). The set of instructions or an application may be utilizedby an end-user through access to a SaaS platform, self-hosted software,on-premise software, or a service provided through a remote platform.

In general terms, a neural network may be viewed as a system ofinterconnected artificial “neurons” or nodes that exchange messagesbetween each other. The connections have numeric weights that are“tuned” during a training process, so that a properly trained networkwill respond correctly when presented with an image, pattern, or set ofdata. In this characterization, the network consists of multiple layersof feature-detecting “neurons”, where each layer has neurons thatrespond to different combinations of inputs from the previous layers.

Training of a network is performed using a “labeled” dataset of inputsin an assortment of representative input patterns (or datasets) that areassociated with their intended output response. Training usesgeneral-purpose methods to iteratively determine the weights forintermediate and final feature neurons. In terms of a computationalmodel, each neuron calculates the dot product of inputs and weights,adds a bias, and applies a non-linear trigger or activation function(for example, using a sigmoid response function).

Machine learning (ML) is used to analyze data and assist in makingdecisions in multiple industries. To benefit from using machinelearning, a machine learning algorithm is applied to a set of trainingdata and labels to generate a “model” which represents what theapplication of the algorithm has “learned” from the training data. Eachelement (or example) in the form of one or more parameters, variables,characteristics, or “features” of the set of training data is associatedwith a label or annotation that defines how the element should beclassified by the trained model. A machine learning model can predict orinfer an outcome based on the training data and labels and be used aspart of a decision process. When trained, the model will operate on anew element of input data to generate the correct label orclassification as an output.

The disclosure includes the following clauses and embodiments:

1. A method of generating an item for a test, comprising:

-   -   obtaining an instruction, one or more examples, and a        conditioning for a transformer-based language model;    -   operating the transformer-based language model using the        instruction, one or more examples, and conditioning as inputs to        generate one or more source passages;    -   evaluating each of the generated source passages to select a        source passage for use in generating the test item;    -   identifying one or more attributes of the selected source        passage;    -   identifying an associated value for each of the one or more        identified attributes of the selected source passage;    -   generating one or more alternative passages for the selected        source passage using one or more of the attributes and        associated values of the selected source passage as the        conditioning for the transformer-based language model;    -   generating a multiple-choice question based on the selected        source passage;    -   generating one or more correct responses to the multiple-choice        question based on the selected source passage using the selected        source passage as a conditioning for the transformer-based        language model;    -   for each of the generated alternative passages for the selected        source passage, generating one or more incorrect responses to        the multiple-choice question based on the selected source        passage by using the alternative passage as the conditioning for        the transformer-based language model;    -   evaluating the generated correct and incorrect responses to        select one or more correct answers for the multiple-choice        question and one or more incorrect answers; and    -   constructing a test item using the selected correct and        incorrect answers for the multiple-choice question based on the        selected source passage.

2. The method of clause 1, wherein evaluating each of the generatedsource passages comprises using criteria, wherein the criteria includeone or more of the minimum or maximum number of words, the minimum ormaximum number of characters, the presence or absence of duplicatedwords, phrases, or sentences, the presence of rare words, the presenceof a potentially offensive or inappropriate word, phrase, or sentence,the presence of a punctuation or grammatical error, a measure of thedifficulty of the source passage, or an estimate of the likelihood of aphrase or sentence in the source passage.

3. The method of clause 1, wherein the transformer-based language modelis a Generative Pre-Trained Transformer.

4. The method of clause 1, wherein generating one or more correctresponses to the multiple-choice question based on the selected sourcepassage further comprises using an item generation template, wherein theitem generation template comprises one or more of an instruction, one ormore examples, with each example consisting of a passage and one or morecorrect answers, and a conditioning consisting of the selected sourcepassage.

5. The method of clause 1, wherein evaluating the generated correct andincorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers furthercomprises using criteria, wherein the criteria include one or more ofthe minimum or maximum number of words, the minimum or maximum number ofcharacters, the presence or absence of duplicated words, phrases, orsentences, the presence of rare words, the presence of an offensive orinappropriate word, phrase, or sentence, or the presence of apunctuation or grammatical error.

6. The method of clause 1, wherein evaluating the generated correct andincorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers furthercomprises using criteria, wherein the criteria include one or more ofsimilarity to the source text as estimated by vector similarities of thetext encoded by a separate language model, similarity to individualsentences within the source text as estimated by vector similarities ofthe sentences encoded by a separate language model, a degree of N-gramoverlap with the source text, a probability of the generated answer bythe transformer-based language model, a probability of being correct asestimated by a separately trained model, a length, or a presence of rarewords.

7. The method of clause 1, wherein evaluating the generated correct andincorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers furthercomprises using criteria, wherein the criteria include one or more ofsimilarity to the source text, similarity to the chosen correct answer,similarity to unchosen potential correct answers, similarity to otherincorrect answers, a difference in probability of the generated text asmeasured by the output distribution over tokens of the transformer-basedlanguage model between the incorrect answer and the correct answer, alength relative to the chosen correct answer, or a presence of rarewords.

8. The method of clause 1, wherein the instruction comprises one or moreof a format of a generated passage, a length of the generated passage, astyle of the generated passage, or a level of the generated passage.

9. The method of clause 1, wherein the one or more attributes of theselected source passage comprise sentiment, a domain or type ofpublication in which the source passage would be published, a readinglevel, a topic, a format, or the presence of a character or keyword.

10. A system for generating an item for a test, comprising:

-   -   one or more electronic processors configured to execute a set of        computer-executable instructions; and    -   one or more non-transitory electronic data storage media        containing the set of computer-executable instructions, wherein        when executed, the instructions cause the one or more electronic        processors to    -   obtain an instruction, one or more examples, and a conditioning        for a transformer-based language model;    -   operate the transformer-based language model using the        instruction, one or more examples, and conditioning as inputs to        generate one or more source passages;    -   evaluate each of the generated source passages to select a        source passage for use in generating the test item;    -   identify one or more attributes of the selected source passage;    -   identify an associated value for each of the one or more        identified attributes of the selected source passage;    -   generate one or more alternative passages for the selected        source passage using one or more of the attributes and        associated values of the selected source passage as the        conditioning for the transformer-based language model;    -   generate a multiple-choice question based on the selected source        passage;    -   generate one or more correct responses to the multiple-choice        question based on the selected source passage using the selected        source passage as a conditioning for the transformer-based        language model;    -   for each of the generated alternative passages for the selected        source passage, generate one or more incorrect responses to the        multiple-choice question based on the selected source passage by        using the alternative passage as the conditioning for the        transformer-based language model;    -   evaluate the generated correct and incorrect responses to select        one or more correct answers for the multiple-choice question and        one or more incorrect answers; and    -   construct a test item using the selected correct and incorrect        answers for the multiple-choice question based on the selected        source passage.

11. The system of clause 10, wherein evaluating each of the generatedsource passages comprises using criteria, wherein the criteria includeone or more of the minimum or maximum number of words, the minimum ormaximum number of characters, the presence or absence of duplicatedwords, phrases, or sentences, the presence of rare words, the presenceof a potentially offensive or inappropriate word, phrase, or sentence,the presence of a punctuation or grammatical error, a measure of thedifficulty of the source passage, or an estimate of the likelihood of aphrase or sentence in the source passage.

12. The system of clause 10, wherein the transformer-based languagemodel is a Generative Pre-Trained Transformer.

13. The system of clause 10, wherein generating one or more correctresponses to the multiple-choice question based on the selected sourcepassage further comprises using an item generation template, wherein theitem generation template comprises one or more of an instruction, one ormore examples, with each example consisting of a passage and one or morecorrect answers, and a conditioning consisting of the selected sourcepassage.

14. The system of clause 10, wherein evaluating the generated correctand incorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers furthercomprises using criteria, wherein the criteria include one or more ofthe minimum or maximum number of words, the minimum or maximum number ofcharacters, the presence or absence of duplicated words, phrases, orsentences, the presence of rare words, the presence of an offensive orinappropriate word, phrase, or sentence, or the presence of apunctuation or grammatical error.

15. The system of clause 10, wherein the instruction comprises one ormore of a format of a generated passage, a length of the generatedpassage, a style of the generated passage, or a level of the generatedpassage.

16. The system of clause 10, wherein the one or more attributes of theselected source passage comprise sentiment, a domain or type ofpublication in which the source passage would be published, a readinglevel, a topic, a format, or the presence of a character or keyword.

17. One or more non-transitory computer-readable media comprising a setof computer-executable instructions that when executed by one or moreprogrammed electronic processors, cause the processors to obtain aninstruction, one or more examples, and a conditioning for atransformer-based language model;

-   -   operate the transformer-based language model using the        instruction, one or more examples, and conditioning as inputs to        generate one or more source passages;    -   evaluate each of the generated source passages to select a        source passage for use in generating the test item;    -   identify one or more attributes of the selected source passage;    -   identify an associated value for each of the one or more        identified attributes of the selected source passage;    -   generate one or more alternative passages for the selected        source passage using one or more of the attributes and        associated values of the selected source passage as the        conditioning for the transformer-based language model;    -   generate a multiple-choice question based on the selected source        passage;    -   generate one or more correct responses to the multiple-choice        question based on the selected source passage using the selected        source passage as a conditioning for the transformer-based        language model;    -   for each of the generated alternative passages for the selected        source passage, generate one or more incorrect responses to the        multiple-choice question based on the selected source passage by        using the alternative passage as the conditioning for the        transformer-based language model;    -   evaluate the generated correct and incorrect responses to select        one or more correct answers for the multiple-choice question and        one or more incorrect answers; and    -   construct a test item using the selected correct and incorrect        answers for the multiple-choice question based on the selected        source passage.

18. The one or more non-transitory computer-readable media of clause 17,wherein evaluating each of the generated source passages comprises usingcriteria, wherein the criteria include one or more of the minimum ormaximum number of words, the minimum or maximum number of characters,the presence or absence of duplicated words, phrases, or sentences, thepresence of rare words, the presence of a potentially offensive orinappropriate word, phrase, or sentence, the presence of a punctuationor grammatical error, a measure of the difficulty of the source passage,or an estimate of the likelihood of a phrase or sentence in the sourcepassage.

19. The one or more non-transitory computer-readable media of clause 17,wherein the transformer-based language model is a Generative Pre-TrainedTransformer.

20. The one or more non-transitory computer-readable media of clause 17,wherein generating one or more correct responses to the multiple-choicequestion based on the selected source passage further comprises using anitem generation template, wherein the item generation template comprisesone or more of an instruction, one or more examples, with each exampleconsisting of a passage and one or more correct answers, and aconditioning consisting of the selected source passage.

The present disclosure can be implemented in the form of control logicusing computer software in a modular or integrated manner. Based on thedisclosure and teachings provided herein, a person of ordinary skill inthe art will know and appreciate other ways and/or methods to implementan embodiment of the disclosure using hardware, software, or acombination of hardware and software.

Any of the software components, processes or functions described in thisapplication may be implemented as software code to be executed by aprocessor using any suitable computer language such as Python, Java,JavaScript, C++, or Perl using procedural, functional, object-oriented,or other techniques. The software code may be stored as a series ofinstructions, or commands in (or on) a non-transitory computer-readablemedium, such as a random-access memory (RAM), a read only memory (ROM),a magnetic medium such as a hard-drive, or an optical medium such as aCD-ROM. In this context, a non-transitory computer-readable medium isalmost any medium suitable for the storage of data or an instruction setaside from a transitory waveform. Any such computer readable medium mayreside on or within a single computational apparatus and may be presenton or within different computational apparatuses within a system ornetwork.

According to one example implementation, the term processing element orprocessor, as used herein, may be a central processing unit (CPU), orconceptualized as a CPU (such as a virtual machine). In this exampleimplementation, the CPU or a device in which the CPU is incorporated maybe coupled, connected, and/or in communication with one or moreperipheral devices, such as display. In another example implementation,the processing element or processor may be incorporated into a mobilecomputing device, such as a smartphone or tablet computer.

The non-transitory computer-readable storage medium referred to hereinmay include a number of physical drive units, such as a redundant arrayof independent disks (RAID), a flash memory, a USB flash drive, anexternal hard disk drive, thumb drive, pen drive, key drive, aHigh-Density Digital Versatile Disc (HD-DV D) optical disc drive, aninternal hard disk drive, a Blu-Ray optical disc drive, or a HolographicDigital Data Storage (HDDS) optical disc drive, synchronous dynamicrandom access memory (SDRAM), or similar devices or other forms ofmemories based on similar technologies. Such computer-readable storagemedia allow the processing element or processor to accesscomputer-executable process steps, application programs and the like,stored on removable and non-removable memory media, to off-load datafrom a device or to upload data to a device. As mentioned, with regardsto the embodiments described herein, a non-transitory computer-readablemedium may include almost any structure, technology, or method apartfrom a transitory waveform or similar medium.

Certain implementations of the disclosed technology are described hereinwith reference to block diagrams of systems, and/or to flowcharts orflow diagrams of functions, operations, processes, or methods. It willbe understood that one or more blocks of the block diagrams, or one ormore stages or steps of the flowcharts or flow diagrams, andcombinations of blocks in the block diagrams and stages or steps of theflowcharts or flow diagrams, respectively, may be implemented bycomputer-executable program instructions. Note that in some embodiments,one or more of the blocks, or stages or steps may not necessarily needto be performed in the order presented or may not necessarily need to beperformed at all.

These computer-executable program instructions may be loaded onto ageneral-purpose computer, a special purpose computer, a processor, orother programmable data processing apparatus to produce a specificexample of a machine, such that the instructions that are executed bythe computer, processor, or other programmable data processing apparatuscreate means for implementing one or more of the functions, operations,processes, or methods described herein. These computer programinstructions may also be stored in a computer-readable memory that maydirect a computer or other programmable data processing apparatus tofunction in a specific manner, such that the instructions stored in thecomputer-readable memory produce an article of manufacture includinginstruction means that implement one or more of the functions,operations, processes, or methods described herein.

While certain implementations of the disclosed technology have beendescribed in connection with what is presently considered to be the mostpractical and various implementations, it is to be understood that thedisclosed technology is not to be limited to the disclosedimplementations. Instead, the disclosed implementations are intended tocover various modifications and equivalent arrangements included withinthe scope of the appended claims. Although specific terms are employedherein, they are used in a generic and descriptive sense only and notfor purposes of limitation.

This written description uses examples to disclose certainimplementations of the disclosed technology, and to enable any personskilled in the art to practice certain implementations of the disclosedtechnology, including making and using any devices or systems andperforming any incorporated methods. The patentable scope of certainimplementations of the disclosed technology is defined in the claims,and may include other examples that occur to those skilled in the art.Such other examples are intended to be within the scope of the claims ifthey have structural and/or functional elements that do not differ fromthe literal language of the claims, or if they include structural and/orfunctional elements with insubstantial differences from the literallanguage of the claims.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and/or were set forth in its entiretyherein.

The use of the terms “a” and “an” and “the” and similar referents in thespecification and in the following claims are to be construed to coverboth the singular and the plural, unless otherwise indicated herein orclearly contradicted by context. The terms “having,” “including,”“containing” and similar referents in the specification and in thefollowing claims are to be construed as open-ended terms (e.g., meaning“including, but not limited to,”) unless otherwise noted. Recitation ofranges of values herein are merely indented to serve as a shorthandmethod of referring individually to each separate value inclusivelyfalling within the range, unless otherwise indicated herein, and eachseparate value is incorporated into the specification as if it wereindividually recited herein. All methods described herein may beperformed in any suitable order unless otherwise indicated herein orclearly contradicted by context. The use of any and all examples, orexemplary language (e.g., “such as”) provided herein, is intended merelyto better illuminate embodiments of the invention and does not pose alimitation to the scope of the invention unless otherwise claimed. Nolanguage in the specification should be construed as indicating anynon-claimed element as essential to each embodiment of the presentinvention.

As used herein (i.e., the claims, figures, and specification), the term“or” is used inclusively to refer to items in the alternative and incombination.

Different arrangements of the components depicted in the drawings ordescribed above, as well as components and steps not shown or describedare possible. Similarly, some features and sub-combinations are usefuland may be employed without reference to other features andsub-combinations. Embodiments of the invention have been described forillustrative and not restrictive purposes, and alternative embodimentswill become apparent to readers of this patent. Accordingly, the presentinvention is not limited to the embodiments described above or depictedin the drawings, and various embodiments and modifications may be madewithout departing from the scope of the claims below.

What is claimed is:
 1. A method of generating an item for a test,comprising: obtaining an instruction, one or more examples, and aconditioning for a transformer-based language model; operating thetransformer-based language model using the instruction, one or moreexamples, and conditioning as inputs to generate one or more sourcepassages; evaluating each of the generated source passages to select asource passage for use in generating the test item; identifying one ormore attributes of the selected source passage; identifying anassociated value for each of the one or more identified attributes ofthe selected source passage; generating one or more alternative passagesfor the selected source passage using one or more of the attributes andassociated values of the selected source passage as the conditioning forthe transformer-based language model; generating a multiple-choicequestion based on the selected source passage; generating one or morecorrect responses to the multiple-choice question based on the selectedsource passage using the selected source passage as a conditioning forthe transformer-based language model; for each of the generatedalternative passages for the selected source passage, generating one ormore incorrect responses to the multiple-choice question based on theselected source passage by using the alternative passage as theconditioning for the transformer-based language model; evaluating thegenerated correct and incorrect responses to select one or more correctanswers for the multiple-choice question and one or more incorrectanswers; and constructing a test item using the selected correct andincorrect answers for the multiple-choice question based on the selectedsource passage.
 2. The method of claim 1, wherein evaluating each of thegenerated source passages comprises using criteria, wherein the criteriainclude one or more of the minimum or maximum number of words, theminimum or maximum number of characters, the presence or absence ofduplicated words, phrases, or sentences, the presence of rare words, thepresence of a potentially offensive or inappropriate word, phrase, orsentence, the presence of a punctuation or grammatical error, a measureof the difficulty of the source passage, or an estimate of thelikelihood of a phrase or sentence in the source passage.
 3. The methodof claim 1, wherein the transformer-based language model is a GenerativePre-Trained Transformer.
 4. The method of claim 1, wherein generatingone or more correct responses to the multiple-choice question based onthe selected source passage further comprises using an item generationtemplate, wherein the item generation template comprises one or more ofan instruction, one or more examples, with each example consisting of apassage and one or more correct answers, and a conditioning consistingof the selected source passage.
 5. The method of claim 1, whereinevaluating the generated correct and incorrect responses to select oneor more correct answers for the multiple-choice question and one or moreincorrect answers further comprises using criteria, wherein the criteriainclude one or more of the minimum or maximum number of words, theminimum or maximum number of characters, the presence or absence ofduplicated words, phrases, or sentences, the presence of rare words, thepresence of an offensive or inappropriate word, phrase, or sentence, orthe presence of a punctuation or grammatical error.
 6. The method ofclaim 1, wherein evaluating the generated correct and incorrectresponses to select one or more correct answers for the multiple-choicequestion and one or more incorrect answers further comprises usingcriteria, wherein the criteria include one or more of similarity to thesource text as estimated by vector similarities of the text encoded by aseparate language model, similarity to individual sentences within thesource text as estimated by vector similarities of the sentences encodedby a separate language model, a degree of N-gram overlap with the sourcetext, a probability of the generated answer by the transformer-basedlanguage model, a probability of being correct as estimated by aseparately trained model, a length, or a presence of rare words.
 7. Themethod of claim 1, wherein evaluating the generated correct andincorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers furthercomprises using criteria, wherein the criteria include one or more ofsimilarity to the source text, similarity to the chosen correct answer,similarity to unchosen potential correct answers, similarity to otherincorrect answers, a difference in probability of the generated text asmeasured by the output distribution over tokens of the transformer-basedlanguage model between the incorrect answer and the correct answer, alength relative to the chosen correct answer, or a presence of rarewords.
 8. The method of claim 1, wherein the instruction comprises oneor more of a format of a generated passage, a length of the generatedpassage, a style of the generated passage, or a level of the generatedpassage.
 9. The method of claim 1, wherein the one or more attributes ofthe selected source passage comprise sentiment, a domain or type ofpublication in which the source passage would be published, a readinglevel, a topic, a format, or the presence of a character or keyword. 10.A system for generating an item for a test, comprising: one or moreelectronic processors configured to execute a set of computer-executableinstructions; and one or more non-transitory electronic data storagemedia containing the set of computer-executable instructions, whereinwhen executed, the instructions cause the one or more electronicprocessors to obtain an instruction, one or more examples, and aconditioning for a transformer-based language model; operate thetransformer-based language model using the instruction, one or moreexamples, and conditioning as inputs to generate one or more sourcepassages; evaluate each of the generated source passages to select asource passage for use in generating the test item; identify one or moreattributes of the selected source passage; identify an associated valuefor each of the one or more identified attributes of the selected sourcepassage; generate one or more alternative passages for the selectedsource passage using one or more of the attributes and associated valuesof the selected source passage as the conditioning for thetransformer-based language model; generate a multiple-choice questionbased on the selected source passage; generate one or more correctresponses to the multiple-choice question based on the selected sourcepassage using the selected source passage as a conditioning for thetransformer-based language model; for each of the generated alternativepassages for the selected source passage, generate one or more incorrectresponses to the multiple-choice question based on the selected sourcepassage by using the alternative passage as the conditioning for thetransformer-based language model; evaluate the generated correct andincorrect responses to select one or more correct answers for themultiple-choice question and one or more incorrect answers; andconstruct a test item using the selected correct and incorrect answersfor the multiple-choice question based on the selected source passage.11. The system of claim 10, wherein evaluating each of the generatedsource passages comprises using criteria, wherein the criteria includeone or more of the minimum or maximum number of words, the minimum ormaximum number of characters, the presence or absence of duplicatedwords, phrases, or sentences, the presence of rare words, the presenceof a potentially offensive or inappropriate word, phrase, or sentence,the presence of a punctuation or grammatical error, a measure of thedifficulty of the source passage, or an estimate of the likelihood of aphrase or sentence in the source passage.
 12. The system of claim 10,wherein the transformer-based language model is a Generative Pre-TrainedTransformer.
 13. The system of claim 10, wherein generating one or morecorrect responses to the multiple-choice question based on the selectedsource passage further comprises using an item generation template,wherein the item generation template comprises one or more of aninstruction, one or more examples, with each example consisting of apassage and one or more correct answers, and a conditioning consistingof the selected source passage.
 14. The system of claim 10, whereinevaluating the generated correct and incorrect responses to select oneor more correct answers for the multiple-choice question and one or moreincorrect answers further comprises using criteria, wherein the criteriainclude one or more of the minimum or maximum number of words, theminimum or maximum number of characters, the presence or absence ofduplicated words, phrases, or sentences, the presence of rare words, thepresence of an offensive or inappropriate word, phrase, or sentence, orthe presence of a punctuation or grammatical error.
 15. The system ofclaim 10, wherein the instruction comprises one or more of a format of agenerated passage, a length of the generated passage, a style of thegenerated passage, or a level of the generated passage.
 16. The systemof claim 10, wherein the one or more attributes of the selected sourcepassage comprise sentiment, a domain or type of publication in which thesource passage would be published, a reading level, a topic, a format,or the presence of a character or keyword.
 17. One or morenon-transitory computer-readable media comprising a set ofcomputer-executable instructions that when executed by one or moreprogrammed electronic processors, cause the processors to obtain aninstruction, one or more examples, and a conditioning for atransformer-based language model; operate the transformer-based languagemodel using the instruction, one or more examples, and conditioning asinputs to generate one or more source passages; evaluate each of thegenerated source passages to select a source passage for use ingenerating the test item; identify one or more attributes of theselected source passage; identify an associated value for each of theone or more identified attributes of the selected source passage;generate one or more alternative passages for the selected sourcepassage using one or more of the attributes and associated values of theselected source passage as the conditioning for the transformer-basedlanguage model; generate a multiple-choice question based on theselected source passage; generate one or more correct responses to themultiple-choice question based on the selected source passage using theselected source passage as a conditioning for the transformer-basedlanguage model; for each of the generated alternative passages for theselected source passage, generate one or more incorrect responses to themultiple-choice question based on the selected source passage by usingthe alternative passage as the conditioning for the transformer-basedlanguage model; evaluate the generated correct and incorrect responsesto select one or more correct answers for the multiple-choice questionand one or more incorrect answers; and construct a test item using theselected correct and incorrect answers for the multiple-choice questionbased on the selected source passage.
 18. The one or more non-transitorycomputer-readable media of claim 17, wherein evaluating each of thegenerated source passages comprises using criteria, wherein the criteriainclude one or more of the minimum or maximum number of words, theminimum or maximum number of characters, the presence or absence ofduplicated words, phrases, or sentences, the presence of rare words, thepresence of a potentially offensive or inappropriate word, phrase, orsentence, the presence of a punctuation or grammatical error, a measureof the difficulty of the source passage, or an estimate of thelikelihood of a phrase or sentence in the source passage.
 19. The one ormore non-transitory computer-readable media of claim 17, wherein thetransformer-based language model is a Generative Pre-TrainedTransformer.
 20. The one or more non-transitory computer-readable mediaof claim 17, wherein generating one or more correct responses to themultiple-choice question based on the selected source passage furthercomprises using an item generation template, wherein the item generationtemplate comprises one or more of an instruction, one or more examples,with each example consisting of a passage and one or more correctanswers, and a conditioning consisting of the selected source passage.